Using the console vs. Scripts vs. Markdown vs. R Notebooks

Welcome to using R. This subsection will explain the basics about R. First, lets discuss the different ways to write R code, as there are (at least) 5:

The Console

At the bottom left of RStudio you should have a console that looks something like what’s highlighted in red below:

You can type straight into the console, to get a result. You can scroll through your previous commands by pressing the up arrow in the console. Each time code is run in the console it updates the environment in the top right of R-Studio:

Scripts

The word “script” can be interpreted specifically, to refer to a type of R file that includes a lot of code, or generally to refer to any file that includes both R code and code that allows you make a nice looking report. In this subsection, we will be focusing on “script” as a particular type of file. To create a script, click on File -> New File -> R Script

You will then be shown a blank script, in which you can write a series of functions, and then run them. To run the lines of code, select a line, and then press CTRL-ENTER, or highlight a chunk of code and then press CTRL-ENTER. In either case, the code will be sent to the console and run there.

An advantage of a script over just using the console is that you can analyse your data in both structured and complex ways which is difficult if you are typing code directly into the console.

R Markdown

As highlighted above, R Markdown is a type of “script” in the general sense of the word, but allows you to create beautiful .html notebooks (.html files are what internet pages are based on). You are in fact reading an example of what can be produced by R Markdown (and R Notebooks). To make an R Markdown file, click on File -> New File -> R Markdown. You will be asked for a title, author and what output you would like. I would suggest “first markdown”, your name and “html” as the respective answers. You should then see something like:

The following points apply to both R Markdown and R Notebooks

If you look above, you may notice that there are 2 types of code: Markdown (to write a nice looking report) and R (in grey chunks). I think these are well explained here: https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf so I’ll just explain that R Markdowns run all the code in the chunks each time they generate the output file (e.g. html file). This is important to know, because R Notebooks do not do this.

R Notebooks

R Notebooks can be created by clicking on File -> New File -> R Notebook. They look quite similar to R Markdowns, but automatically generate the .html output each time you save the notebook. The output file will be a .nb.html file in the same folder as your notebook.

Very importantly - the .nb.html file will be built based on what happened the last time you run each R chunk. If you never ran the R Chunk, then the nb.html file will not use the output from that chunk. This makes R Notebooks quicker than R Markdowns, because you don’t have to generate the output from scratch each time, but there is a risk of you forgetting to run a chunk after changing it.

Fundamental’s of R

R allows you to complete calculations, so lets start with that. Type into your markdown or script

Often, you can use R as a calculator, for example:

5 + 2
[1] 7

It’s helpful to store these calculations into objects:

a <- 5 + 2  # this is exactly the same as writing a = 5 + 2, but <- is encouraged in R to avoid confusion with other uses of = (e.g. == operator when you are comparing if two values are identical)
# this # is starting a comment, code that will be ignored but allows you to annotate your work
a # to show what the value 5 is now stored in the object "a"
[1] 7

This means that you can compare objects to each other later

b <- 2
b - a
[1] -5
c <- b-a
c # to show what c is
[1] -5

Some advice/rules for Objects:

this_object <- 5
this_object # to show what the object value is
[1] 5
this_object <- 10 # overwrites this_object
this_object # should now show 10
[1] 10

Functions

In a variety of coding languages like R, functions are lines of code you can apply each time you call the function, and apply it to input to get an output. If you wanted to make a function that multiplied numbers by 2, it would look something like:

times_two <- function(input){  # Define your function by stating it's name, and then using <- to describe it
  output = input * 2           # creates an output object that is the input object (see line above) x 2
  return (output)              # gives the output back to the user when they run the function
}
times_two(4)                   # should give you 8
[1] 8

The great news is that you don’t need to write functions 99% of the time in R, there are a wide variety of functions that are available. Some of which will be used in the next section.

Types of Objects

Vectors

Vectors store a series of values. Often these will be a series of numbers:

numeric_vector = c(1,2,3)
numeric_vector
[1] 1 2 3

The “c” above is short for “combine” as you’re combining values together to make this vector. Strings are values that have characters (also known as letters) in them. Lets see if we can make a vector of strings:

string_vector = c("a", "b", "c")
string_vector
[1] "a" "b" "c"

Looks like we can. But what happens if you mix strings and numbers in a vector:

mixed_vector = c("a", 1, "c")
mixed_vector
[1] "a" "1" "c"

R seems to be happy to put them into a single vector. But there are different types of values and vectors, so lets ask R what each type of (using the “typeof” function) vectors we have above:

typeof(numeric_vector)
[1] "double"
typeof(string_vector)
[1] "character"
typeof(mixed_vector)
[1] "character"

The numeric vector is a “double” vector. Double refers to the fact that the numbers can include decimals, as opposed to integer numbers which have to be whole numbers. Interestingly, R has assumed the list of numbers should be double rather than integer, which seems like the more robust thing to do, as integer numbers can always be double, but double numbers can’t always be integers.

Data frames

Data frames look like tables, in which you have a series of columns with headers that describe each column.

Some data frames are already loaded into RStudio when you run it, such as the “mpg” dataframe. To look at it, just type in it’s name:

mpg

Note that you may see 2 tables above, but they should be identical if so

The mpg dataframe has information about a variety of cars, their manufacturers, models, as described https://ggplot2.tidyverse.org/reference/mpg.html. You will need to refer to data frames and their columns, the convention for this being to write data frame$column. Lets do this to see what’s in the “manufacturer” column:

mpg$manufacturer
  [1] "audi"       "audi"       "audi"       "audi"       "audi"       "audi"       "audi"       "audi"      
  [9] "audi"       "audi"       "audi"       "audi"       "audi"       "audi"       "audi"       "audi"      
 [17] "audi"       "audi"       "chevrolet"  "chevrolet"  "chevrolet"  "chevrolet"  "chevrolet"  "chevrolet" 
 [25] "chevrolet"  "chevrolet"  "chevrolet"  "chevrolet"  "chevrolet"  "chevrolet"  "chevrolet"  "chevrolet" 
 [33] "chevrolet"  "chevrolet"  "chevrolet"  "chevrolet"  "chevrolet"  "dodge"      "dodge"      "dodge"     
 [41] "dodge"      "dodge"      "dodge"      "dodge"      "dodge"      "dodge"      "dodge"      "dodge"     
 [49] "dodge"      "dodge"      "dodge"      "dodge"      "dodge"      "dodge"      "dodge"      "dodge"     
 [57] "dodge"      "dodge"      "dodge"      "dodge"      "dodge"      "dodge"      "dodge"      "dodge"     
 [65] "dodge"      "dodge"      "dodge"      "dodge"      "dodge"      "dodge"      "dodge"      "dodge"     
 [73] "dodge"      "dodge"      "ford"       "ford"       "ford"       "ford"       "ford"       "ford"      
 [81] "ford"       "ford"       "ford"       "ford"       "ford"       "ford"       "ford"       "ford"      
 [89] "ford"       "ford"       "ford"       "ford"       "ford"       "ford"       "ford"       "ford"      
 [97] "ford"       "ford"       "ford"       "honda"      "honda"      "honda"      "honda"      "honda"     
[105] "honda"      "honda"      "honda"      "honda"      "hyundai"    "hyundai"    "hyundai"    "hyundai"   
[113] "hyundai"    "hyundai"    "hyundai"    "hyundai"    "hyundai"    "hyundai"    "hyundai"    "hyundai"   
[121] "hyundai"    "hyundai"    "jeep"       "jeep"       "jeep"       "jeep"       "jeep"       "jeep"      
[129] "jeep"       "jeep"       "land rover" "land rover" "land rover" "land rover" "lincoln"    "lincoln"   
[137] "lincoln"    "mercury"    "mercury"    "mercury"    "mercury"    "nissan"     "nissan"     "nissan"    
[145] "nissan"     "nissan"     "nissan"     "nissan"     "nissan"     "nissan"     "nissan"     "nissan"    
[153] "nissan"     "nissan"     "pontiac"    "pontiac"    "pontiac"    "pontiac"    "pontiac"    "subaru"    
[161] "subaru"     "subaru"     "subaru"     "subaru"     "subaru"     "subaru"     "subaru"     "subaru"    
[169] "subaru"     "subaru"     "subaru"     "subaru"     "subaru"     "toyota"     "toyota"     "toyota"    
[177] "toyota"     "toyota"     "toyota"     "toyota"     "toyota"     "toyota"     "toyota"     "toyota"    
[185] "toyota"     "toyota"     "toyota"     "toyota"     "toyota"     "toyota"     "toyota"     "toyota"    
[193] "toyota"     "toyota"     "toyota"     "toyota"     "toyota"     "toyota"     "toyota"     "toyota"    
[201] "toyota"     "toyota"     "toyota"     "toyota"     "toyota"     "toyota"     "toyota"     "volkswagen"
[209] "volkswagen" "volkswagen" "volkswagen" "volkswagen" "volkswagen" "volkswagen" "volkswagen" "volkswagen"
[217] "volkswagen" "volkswagen" "volkswagen" "volkswagen" "volkswagen" "volkswagen" "volkswagen" "volkswagen"
[225] "volkswagen" "volkswagen" "volkswagen" "volkswagen" "volkswagen" "volkswagen" "volkswagen" "volkswagen"
[233] "volkswagen" "volkswagen"

Packages

Whilst a lot of the functions you will need are in the base code that is active by default, you will at times need extra packages of code to do more powerful things. A commonly used package is ggplot2 [https://ggplot2.tidyverse.org/], which allows you to make beautiful figures in R. To use ggplot2 use need to install it and then load it from the library

if(!require(ggplot2)){install.packages("ggplot2")}
library(ggplot2)

Now that you have a package for making beautiful plots, less learn about “intelligent copy and paste” to make use of it.

Intelligent copy and paste

People experienced with coding do not write all their code from memory. They often copy and paste code from the internet and/or from their old scripts. So, assuming you’ve installed and loaded ggplot2 as described above, lets copy and paste code from their website (as of September 2022; https://ggplot2.tidyverse.org/)

ggplot(mpg, aes(displ, hwy, colour = class)) + 
  geom_point()

Good news is that we have a nice looking figure. But now we need to work out how to understand the code we’ve copied so that you can apply it to your own scripts. There’s a lot to unpack, so making the code more vertical can help you break it down and comment it out. Using the below and a description of the mpg dataframe (https://ggplot2.tidyverse.org/reference/mpg.html), can you comment it out

ggplot(
  mpg,              #
  aes(              #
    displ,          #
    hwy,            #
    colour = class  #
  )
) + 
geom_point()        #

Here’s how I would comment it out:

ggplot(
  mpg,              # dataframe
  aes(              # aesthetic properties
    displ,          # x-axis
    hwy,            # y-axis
    colour = class  # which column I will base the color on (often "color" is safer spelling in code)
  )
) + 
geom_point()        # what I would like drawn on

Now to understand the above code, try running it after changing lines. For example, what happens if you change the x-axis:

ggplot(
  mpg,              # dataframe
  aes(              # aesthetic properties
    cty,            # x-axis - updated
    hwy,            # y-axis
    colour = class  # which column I will base the color on (often "color" is safer spelling in code)
  )
) + 
geom_point()        # what I would like drawn on

To make beautiful figures in R, you can largely google the type of plot you want, copy the example code that the website has, and then swap in the relevant features for your plot. This principle of copying and pasting code, (making it vertical to make it legible is not necessary, but can be helpful), and then editing it to work for your own script is an essential skill to speed up your coding.

---
title: "R basics"
output: html_notebook
---

## Using the console vs. Scripts vs. Markdown vs. R Notebooks

Welcome to using R. This subsection will explain the basics about R. First, lets discuss the different ways to write R code, as there are (at least) 5:

- type it directly into the console (generally not recommended)
- save it as a script (better) ***note that script can refer to any of the below, but in this case is being used to describe a script that doesn't generate a notebook***
- save it as an R Markdown (better still) - this allows you to make beautiful documents
- save it as an R Notebook (arguably better than R Markdown) - this allows you to make beautiful documents, and is quicker


### The Console

At the bottom left of RStudio you should have a console that looks something like what's highlighted in <span style="color:red">red</span> below:

<img src="rBasics/console.png">

You can type straight into the console, to get a result. You can scroll through your previous commands by pressing the up arrow in the console. Each time code is run in the console it updates the environment in the top right of R-Studio:

<img src="rBasics/environment.png"/>

### Scripts

The word "script" can be interpreted specifically, to refer to a type of R file that includes a lot of code, or generally to refer to any file that includes both R code and code that allows you make a nice looking report. In this subsection, we will be focusing on "script" as a particular type of file. To create a script, click on **File -> New File -> R Script**

<img src="rBasics/newScript.png"/>

You will then be shown a blank script, in which you can write a series of functions, and then run them. To run the lines of code, select a line, and then press CTRL-ENTER, or highlight a chunk of code and then press CTRL-ENTER. In either case, the code will be sent to the console and run there. 

An advantage of a script over just using the console is that you can analyse your data in both *structured and complex ways* which is difficult if you are typing code directly into the console.

### R Markdown

As highlighted above, R Markdown is a type of "script" in the general sense of the word, but allows you to create beautiful .html notebooks (.html files are what internet pages are based on). You are in fact reading an example of what can be produced by R Markdown (and R Notebooks). To make an R Markdown file, click on **File -> New File -> R Markdown**. You will be asked for a title, author and what output you would like. I would suggest "first markdown", your name and "html" as the respective answers. You should then see something like:

<img src="rBasics/markdown.png"/>

**The following points apply to both R Markdown and R Notebooks**

If you look above, you may notice that there are 2 types of code: Markdown (to write a nice looking report) and R (in grey chunks). I think these are well explained here: https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf so I'll just explain that R Markdowns run all the code in the chunks each time they generate the output file (e.g. html file). This is important to know, because R Notebooks do **not** do this.

### R Notebooks

R Notebooks can be created by clicking on **File -> New File -> R Notebook**. They look quite similar to R Markdowns, but automatically generate the .html output **each time you save the notebook**. The output file will be a **.nb.html** file in the same folder as your notebook. 

***Very importantly*** - the *.nb.html* file will be built based on what happened the last time you run each R chunk. If you never ran the R Chunk, then the nb.html file will not use the output from that chunk. **This makes R Notebooks quicker than R Markdowns, because you don't have to generate the output from scratch each time, but there is a risk of you forgetting to run a chunk after changing it**.

## Fundamental's of R

R allows you to complete calculations, so lets start with that. Type into your markdown or script

Often, you can use R as a calculator, for example:
```{r}
5 + 2
```

It's helpful to store these calculations into **objects**:
```{r}
a <- 5 + 2  # this is exactly the same as writing a = 5 + 2, but <- is encouraged in R to avoid confusion with other uses of = (e.g. == operator when you are comparing if two values are identical)
# this # is starting a comment, code that will be ignored but allows you to annotate your work
a # to show what the value 7 is now stored in the object "a"
```


This means that you can compare objects to each other later

```{r}
b <- 2
b - a
c <- b-a
c # to show what c is
```

Some advice/rules for **Objects**:

- You cannot have a space in an object name. "my object" would be an invalid object name.
- object names are ***case-sensitive***, so if your object is called "MyObject" you cannot refer to it as "myobject".
- "." and "_" are acceptable characters in variable names.
- you can overwrite objects in the same way that you define an object

```{r}
this_object <- 5
this_object # to show what the object value is
this_object <- 10 # overwrites this_object
this_object # should now show 10
```

- be careful to not give an object the same name as a function! This will overwrite the function. To check if the name already exists, you can start typing it and press tab. So typing "t.te" and pressing the tab will give you "t.test"


### Functions

In a variety of coding languages like R, **functions** are lines of code you can apply each time you call the function, and apply it to **input** to get an **output**. If you wanted to make a function that multiplied numbers by 2, it would look something like:

```{r}
times_two <- function(input){  # Define your function by stating it's name, and then using <- to describe it
  output = input * 2           # creates an output object that is the input object (see line above) x 2
  return (output)              # gives the output back to the user when they run the function
}
times_two(4)                   # should give you 8
```

The **great news** is that you don't need to write functions 99% of the time in R, there are a wide variety of functions that are available. Some of which will be used in the next section.

### Types of Objects

#### Vectors

**Vectors** store a series of values. Often these will be a series of numbers:

```{r}
numeric_vector = c(1,2,3)
numeric_vector
```

The "c" above is short for "combine" as you're combining values together to make this vector. 
**Strings** are values that have characters (also known as letters) in them. Lets see if we can make a vector of **strings**:

```{r}
string_vector = c("a", "b", "c")
string_vector
```

Looks like we can. But what happens if you mix strings and numbers in a **vector**:

```{r}
mixed_vector = c("a", 1, "c")
mixed_vector
```

R seems to be happy to put them into a single vector. But there are different types of values and vectors, so lets ask R what each type of (using the "typeof" **function**) vectors we have above:

```{r}
typeof(numeric_vector)
typeof(string_vector)
typeof(mixed_vector)
```

The numeric vector is a "double" vector. **Double** refers to the fact that the numbers *can include decimals*, as opposed to **integer** numbers *which have to be whole numbers*. Interestingly, R has assumed the list of numbers should be **double** rather than **integer**, which seems like the more robust thing to do, as integer numbers can always be double, but double numbers can't always be integers.

#### Data frames

**Data frames** look like tables, in which you have a series of columns with **headers** that describe each column. 

Some data frames are already loaded into RStudio when you run it, such as the "mpg" dataframe. To look at it, just type in it's name:

```{r}
mpg
```
<img src="rBasics/mpgTable.png"/>
*Note that you may see 2 tables above, but they should be identical if so*

The **mpg** dataframe has information about a variety of cars, their manufacturers, models, as described https://ggplot2.tidyverse.org/reference/mpg.html. You will need to refer to data frames and their columns, the convention for this being to write data frame$column. Lets do this to see what's in the "manufacturer" column:

```{r}
mpg$manufacturer
```


### Packages

Whilst a lot of the functions you will need are in the *base* code that is active by default, you will at times need extra packages of code to do more powerful things. A commonly used package is ggplot2 [https://ggplot2.tidyverse.org/], which allows you to make ***beautiful*** figures in R. To use ggplot2 use need to **install** it and then load it from the **library**

```{r}
if(!require(ggplot2)){install.packages("ggplot2")}
library(ggplot2)
```

Now that you have a package for making beautiful plots, less learn about "intelligent copy and paste" to make use of it.

### Intelligent copy and paste

People experienced with coding ***do not write all their code from memory***. They often copy and paste code from the internet and/or from their old scripts. So, assuming you've installed and loaded ggplot2 as described above, lets copy and paste code from their website (as of September 2022; https://ggplot2.tidyverse.org/)

```{r}
ggplot(mpg, aes(displ, hwy, colour = class)) + 
  geom_point()
```
Good news is that we have a nice looking figure. But now we need to work out how to understand the code we've copied so that you can apply it to your own scripts. There's a lot to unpack, so making the code more **vertical** can help you break it down and comment it out. Using the below and a description of the mpg dataframe (https://ggplot2.tidyverse.org/reference/mpg.html), can you comment it out

```{r}
ggplot(
  mpg,              #
  aes(              #
    displ,          #
    hwy,            #
    colour = class  #
  )
) + 
geom_point()        #
```
Here's how I would comment it out:

```{r}
ggplot(
  mpg,              # dataframe
  aes(              # aesthetic properties
    displ,          # x-axis
    hwy,            # y-axis
    colour = class  # which column I will base the color on (often "color" is safer spelling in code)
  )
) + 
geom_point()        # what I would like drawn on
```

Now to understand the above code, try running it after changing lines. For example, what happens if you change the x-axis:

```{r}
ggplot(
  mpg,              # dataframe
  aes(              # aesthetic properties
    cty,            # x-axis - updated
    hwy,            # y-axis
    colour = class  # which column I will base the color on (often "color" is safer spelling in code)
  )
) + 
geom_point()        # what I would like drawn on

```
To make beautiful figures in R, you can largely google the type of plot you want, copy the example code that the website has, and then swap in the relevant features for your plot. This principle of copying and pasting code, (making it vertical to make it legible is not necessary, but can be helpful), and then editing it to work for your own script is an **essential skill** to speed up your coding.